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We consider the standard Neyman-Pearson hypothesis test of a signal-plus-background hypothesis and 
background-only hypothesis in the presence of uncertainty on the background-only prediction. Surprisingly, 
this problem has not been addressed in the recent conferences on statistical techniques in high-energy physics — 
although the its confidence- interval equivalent has been. We discuss the issues of power, similar tests, coverage, 
and ordering rules. The method presented is compared to the Cousins-Highland technique, the ratio of Poisson 
means, and "profile" method. 



1. Introduction 

In the last five years there have been several con- 
ferences on statistics for particle physics. Much of 
the emphasis of these conferences were on limit set- 
ting and the Feldman- Cousins "unified approach" , the 
quintessential frequentist method based on the Ney- 
man construction. As particle physicists prepare for 
the Large Hadron CoUider (LHC) at CERN, we will 
need to reexamine our list of statistical tools in the 
context of discovery. In fact, there has been no pre- 
sentation at these statistical conferences on frequen- 
tist hypothesis testing in the presence of uncertainty 
on the background. 

In Section [3 we will review the Neyman-Pearson 
theory for testing between two simple hypotheses, and 
examine the impact of background uncertainty in Sec- 
tion|21 In Sections EUni we will present a fully frequen- 
tist method for hypothesis testing with background 
uncertainty based on the Neyman Construction. In 
the remainder of the text we will present an example 
and compare this method to other existing methods. 



2. Simple Hypothesis Testing 

In the case of Simple Hypothesis testing, the 
Neyman-Pearson theory (which we review briefly for 
completeness) begins with two Hypotheses: the null 
hypothesis Hq and the alternate hypothesis Hi 
These hypotheses are called simple because they have 
no free parameters. Predictions of some physical ob- 
servable X can be made with these hypotheses and 
described by the likehhood functions L{x\Ho) and 
L{x\Hi) (for simplicity, think of x as the number of 
events observed). 

Next, one defines a region W £ I such that if the 
data fall in W we accept the Hq (and reject Hi). Con- 
versely, if the data fall in / — ly we reject Hq and 
accept the Hi. The probability to commit a Type I 
error is called the size of the test and is given by 

a = / L(x\Ho)dx. (1) 
Ji-w 



The probability to commit a Type II error is given by 

(3=1 L[x\Hi)dx. (2) 
Jw 

Finally, the Neyman-Pearson lemma tells us that the 
region W of size a which minimizes the rate of Type II 
error (maximizes the power) is given by 

3. Nuisance Parameters 

Within physics, the majority of the emphasis on 
statistics has been on limit setting - which can be 
translated to hypothesis testing through a well known 
dictionary When one includes nuisance param- 
eters 9s (parameters that are not of interest or not 
observable to the experimenter) into the calculation 
of a confidence interval, one must ensure coverage for 
every value of the nuisance parameter. When one is 
interested in hypothesis testing, there is no longer a 
physics parameter Or to cover, instead one must ensure 
the rate of Type I error is bounded by some predefined 
value. Analogously, when one includes a nuisance pa- 
rameters in the null hypothesis, one must ensure that 
the rate of Type I error is bounded for every value 
of the nuisance parameter. Ideally one can find an 
acceptance region W which has the same size for all 
values of the nuisance parameter (i.e. a similar test). 
Furthermore, the power of a region W also depends 
on the nuisance parameter; ideally, we would like to 
maximize the power for all values of the nuisance pa- 
rameter {i.e. Uniformly Most Powerful). Such tests 
do not exist in general. 

In this note, we wish to address how the standard 
hypothesis test is modified by uncertainty on the back- 
ground prediction. The uncertainty in the background 
prediction represents the presence of a nuisance pa- 
rameter: for example, let us assume it is the expected 
background b. Typically, an auxiliary, or side-band, 
measurement is made to provide a handle on the nui- 
sance parameter. Let us generically call that mea- 
surement M and L{M\Ho,b) the prediction of that 
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measurement given the null hypothesis with nuisance 
parameter b. In Section |S1 we address the special case 
that L{M\Hq, b) is a Poisson distribution. 

4. The Neyman-Construction 

Usually one does not consider an explicit Neyman 
construction when performing hypothesis testing be- 
tween two simple hypotheses; though one exists im- 
plicitly. Because of the presence of the nuisance pa- 
rameter, the implicit Neyman construction must be 
made explicit and the dimensionality increased. The 
basic idea is that for each value of the nuisance param- 
eters 9s, one must construct an acceptance interval 
(for iJo) in a space which includes their correspond- 
ing auxiliary measurements M , and the original test 
statistic X which was being used to test _ffo against 
Hi. 

For the simple case introduced in the previous sec- 
tion, this requires a three-dimensional construction 
with 6, M, and x. For each value of 6, one must 
construct a two-dimensional acceptance region W}, of 
size a (under Hq). If an experiment's data {xq,Mq) 
fall into an acceptance region Wh, then one cannot 
exclude the null hypothesis with 100(1 — a)% confi- 
dence. Conversely, to reject the null hypothesis {i.e. 
claim a discovery) the data must not lie in any ac- 
ceptance region Wb- Said yet another way, to claim 
a discovery, the confidence interval for the nuisance 
parameter(s) must be empty (when the construction 
is made assuming the null hypothesis). 

5. The Ordering Rule 

The basic criterion for discovery was discussed ab- 
stractly in the previous section. In order to provide 
an actual calculation, one must provide an ordering 
rule: an algorithm which decides how to chose the re- 
gion Wb. Recall, that there the constraint on Type I 
error does not uniquely specify an acceptance region 
for i?o- In the Neyman-Pearson lemma, it is the al- 
ternate hypothesis Hi that breaks the symmetry be- 
tween possible acceptance regions. Also in the unified 
approach, it is the likelihood ratio that is used as an 
ordering rule ■ 

At the Workshop on conference limits at FermiLab, 
Feldman showed that Unified Method with Nuisance 
Parameters is in Kendall's Theory (the chapter on 
likeHhood ratio tests & test efficiency) Q. The nota- 
tion used by Kendall is given in TableHl Also, Kendall 
identifies Hq with 6^ — Oro and Hi with 9r ^ Oro- 

Let us briefly quote from Kendall: 

"Now consider the Likelihood Ratio 

^ ^ L{x\eroJs) 
L{x\er,9s) 



Variable Meaning 

6r physics parameters 

6s nuisance parameters 

6r,6s unconditionally maximize L{x\0r,6s) 

9s conditionally maximize L{x\6r0;0 s) 

Table I The notation used by Kendall for likelihood tests 
with nuisance parameters 

Intuitively I is a reasonable test statistic 
for Hf): it is the maximum likelihood un- 
der Hq as a fraction of its largest possible 
value, and large values of I signify that Hq 
is reasonably acceptable." 

Feldman uses this chapter as motivation for the pro- 
file method (see Section though in Kendall's book 
the same likelihood ratio is used as an ordering rule 
for each value of the nuisance parameter. 

The author tried simple variations on this order- 
ing rule before rediscovering it as written. It is worth 
pointing out that Eq.^jis independent of the nuisance 
parameter 6; however, the contour of la which pro- 
vides an acceptance region of size a is not necessarily 
independent of b. It is also worth pointing out that 
Or and 9s do not consider the null hypothesis - if they 
did, the region in which I — 1 may be larger than 

(1 — a). Finally, if one uses 9s instead of 9s or 9s, one 
will not obtain tests which are approximately similar. 



6. An Example 

Let us consider the case when the nuisance param- 
eter is the expected number of background events b 
and M is an auxiliary measurement of b. Further- 
more, let us assume that we have a absolute prediction 
of the number of signal events s. For our test statis- 
tic we choose the number of events observed x which 
is Poisson distributed with mean = b for Ha and 
^ = s + b for Hi . In the construction there are no as- 
sumptions about L{M\HQ,b) - it could be some very 
complicated shape relating particle identification effi- 
ciencies, Monte Carlo extrapolation, etc. In the case 
where L{M\Ho,b) is a Poisson distribution, other so- 
lutions exist (see Section ISJ. For our example, let us 
take L{M\Ho, b) to be a Normal distribution centered 
on b with standard deviation A6, where A is some 
relative systematic error. Additionally, let us assume 
that we can factorize L{x, M\H, b) — L{x\H, b)L{AI\b) 
(where H is either Hq or Hi). 

For our example problem, we can re-write the or- 
dering rule in Eq. 01 as 

^ ^ L{x,M\Ho,h) 
L(x,M\Hi,by 
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Figure 1: The Neyman construction for a test statistic x, 
an auxiliary measurement M, and a nuisance parameter 
h. Vertical planes represent acceptance regions Wb for Ho 
given b. The condition for discovery corresponds to data 
{xo,Mo) that do not intersect any acceptance region. 
The contours of L{x, M\Hq, b) are in color. 



where b conditionally maximizes L{x, M\Hi,b) and b 
conditionally maximizes L{x, M\Hq, b). 

Now let us take s = 50 and A = 5%, both of which 
could be determined from Monte Carlo. In our toy ex- 
ample, we collect data Mq = 100. Let a = 2.85 • 10"'', 
which corresponds to 5a. The question now is how 
many events x must we observe to claim a discovery?^ 
The condition for discovery is that {xq, AIq) do not lie 
in any acceptance region Wt- In Fig. Qa sample of 
acceptance regions are displayed. One can imagine a 
horizontal plane at Mq = 100 slicing through the var- 
ious acceptance regions. The condition for discovery 
is that xq > Xniax where Xmax is the maximal x in the 
intersection. 

There is one subtlety which arises from the or- 
dering rule in Eq. [S| The acceptance region Wb = 
{{x,M) I I > la} is bounded by a contour of the 
likelihood ratio and must satisfy the constraint of size: 
Jy^,^L{x,M\Ho,b) = (1 - a). While it is true that 
the likelihood is independent of b, the constraint on 
size is dependent upon b. Similar tests are achieved 
when la is independent of b. The contours of the like- 
lihood ratio are shown in Fig. |21 together with con- 
tours of L{x, M\Ho,b). While tests are roughly sim- 
ilar for b Ri M, similarity is violated for M ^ b. 
This violation should be irrelevant because clearly 
b <^ M should not be accepted. This problem can 
be avoided by clipping the acceptance region around 
M = 6 ± NAb, where N is sufficiently large (« 10) 
to have negligible affect on the size of the acceptance 



^In practice, one would measure xq and Mq and then ask, 
"have we made a discovery?" . For the sake of explanation, we 
have broken this process into two pieces. 
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Figure 2: Contours of the likelihood L{x, M\Ho,b) are 
shown as concentric ellipses for 6 = 32 and 6 = 80. 
Contours of the likelihood ratio in Eq. |^are shown as 
diagonal lines. This figure schematically illustrates that if 
one chooses acceptance regions based solely on contours 
of the likelihood ratio, that similarity is badly violated. 
For example, data M = 80, a; = 130 would be considered 
part of the acceptance region for b = 32, even though it 
should clearly be ruled out. 

region. Fig. ^ shows the acceptance region with this 
slight modification. 

In the case where s = 50, A = 5%, and Mq — 100, 
one must observe 167 events to claim a discovery. 
While no figure is provided, the range of b consis- 
tent with Afo = 100 (and no constraint on x) is 
b £ [68,200]. In this range, the tests are similar to 
a very high degree. 

7. The Cousins-Highland Technique 

The Cousins- Highland approach to hypothesis test- 
ing is quite popular Q because it is a simple smear- 
ing on the nuisance parameter In particular, the 
background-only hypothesis L{x\Ho, b) is transformed 
from a compound hypothesis with nuisance parameter 
6 to a simple hypothesis L'{x\Ho) by 

L'{x\Ho) ^ [ L{x\Ho,b)L{b)db, (6) 

Jb 

where L(b) is typically a normal distribution. The 
problem with this method is largely philosophical: 
L(b) is meaningless in a frequentist formalism. In a 
Bayesian formalism one can obtain L{b) by consider- 
ing L{M\b) and inverting it with the use of Bayes's 
theorem and the a priori likelihood for b. Typically, 
L{M\b) is normal and one assumes a flat prior on b. 

In the case where s = 50, L{b) is a normal distribu- 
tion with mean fi = Mq = 100 and standard deviation 
a = AAIq = 5, one must observe 161 events to claim a 
discovery. Initially, one might think that 161 is quite 
close to 167; however, they differ at the 4% level and 
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the methods are only considering a A = 5% effect. 
Still worse, if Hq is true (say ht = 100) and one can 
claim a discovery with the Cousins-Highland method 
(xq > 161), the chance that one could not claim a dis- 
covery with the fully frequentist method (xq < 167) 
is « 95%. Similarly, if Hi is true and one can claim 
a discovery with the Cousins-Highland method, the 
chance that one could not claim a discovery with the 
fully frequentist method is « 50%. Even practically, 
there is quite a difference between these two methods. 



8. The Ratio of Poisson Means 

During the conference, J. Linnemann presented re- 
sults on the ratio of Poisson means. In that case, 
one considers a background and a signal process, both 
with unknown means. By making "on-source" {i.e. 
x) and "off-source" [i.e. M) measurements one can 
form a confidence interval on the ratio A = s/b. If 
the 100(1 — a)% confidence interval for A does not in- 
clude 0, then one could claim discovery. This approach 
does take into account uncertainty on the background; 
however, it is restricted to the case in which L{AI\b) 
is a Poisson distribution. 

There are two variations on this technique. The 
first technique has been known for quite some time 
and was first brought to physics in Ref. Q. This ap- 
proach conditions on x+M , which allows one to tackle 
the problem with the use of a binomial distribution. 
Later, Cousins improved on these limits by removing 
the conditioning and considering the full Neyman con- 
struction Cousins paper has an excellent review 
of the literature for those interested in this technique. 



However, Monte Carlo sampling the nuisance param- 
eters does not suffer from the curse of dimensionality 
and serves as a more robust approximation of the full 
construction than the profile method. 

10. Conclusion 

We have presented a fully frequentist method for 
hypothesis testing. The method consists of a Ney- 
man construction in each of the nuisance parame- 
ters, their corresponding auxiliary measurements, and 
the test statistic that was originally used to test Hq 
against Hi. We have chosen as an ordering rule the 
likelihood ratio with the nuisance parameters con- 
ditionally maximized to their respective hypotheses. 
With a slight modification, this ordering rule produces 
tests that are approximately similar. We have com- 
pared this method to the most common methods in 
the field. This method is philosophically more sound 
than the Cousins-Highland technique and more gen- 
eral than the ratio of Poisson means. This method 
can be made computationally less intensive either with 
Monte Carlo sampling of the nuisance parameters or 
by the approximation known as the profile method. 
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9. The Profile Method 

As was mentioned in Section the likelihood ratio 
in Eq. ^ is independent of the nuisance parameters. 
If it were not for the violations in similarity between 
tests, one would only need to perform the construc- 
tion for one value of the nuisance parameters. Clearly, 

6s is an appropriate choice to perform the construc- 
tion. This is the logic behind the profile method. It 
should be pointed out that the profile method is an ap- 
proximation to the full Neyman construction; though 
a particularly good one. In the example above with 
xq — 167, Mq — 100, the construction would be made 

at6 = 6=117 which gives the identical result as the 
fully frequentist method. 

The main advantage to the profile method is that 
of speed and scalability. Instead of performing the 
construction for every value of the nuisance param- 
eters, one must only perform the construction once. 
For many variables, the fully frequentist method is 
not scalable if one naively loops over on a fixed grid. 
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